Topic words analysis based on LDA model

نویسندگان

  • Xi Qiu
  • Christopher Stewart
چکیده

Social network analysis (SNA), which is a research field describing and modeling the social connection of a certain group of people, is popular among network services. Our topic words analysis project is a SNA method to visualize the topic words among emails from Obama.com to accounts registered in Columbus, Ohio. Based on Latent Dirichlet Allocation (LDA) model, a popular topic model of SNA, our project characterizes the preference of senders for target group of receptors. Gibbs sampling is used to estimate topic and word distribution. Our training and testing data are emails from the carbon-free server Datagreening.com. We use parallel computing tool BashReduce for word processing and generate related words under each latent topic to discovers typical information of political news sending specially to local Columbus receptors. Running on two instances using paralleling tool BashReduce, our project contributes almost 30% speedup processing the raw contents, comparing with processing contents on one instance locally. Also, the experimental result shows that the LDA model applied in our project provides precision rate 53.96% higher than TF-IDF model finding target words, on the condition that appropriate size of topic words list is selected.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Sentiment Analysis with Global Topics and Local Dependency

With the development of Web 2.0, sentiment analysis has now become a popular research problem to tackle. Recently, topic models have been introduced for the simultaneous analysis for topics and the sentiment in a document. These studies, which jointly model topic and sentiment, take the advantage of the relationship between topics and sentiment, and are shown to be superior to traditional senti...

متن کامل

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

Web Event Topic Analysis by Topic Feature Clustering and Extended LDA Model

To analyze topics of a large number of web events, we proposed an event topic analysis approach by topic feature clustering and extended LDA (latent dirichlet allocation) model. The extended LDA model is dimension LDA (DLDA) which integrates topic probability of LDA model. We represent an event as a multi-dimensions vector and use DLDA model to select topic feature words in events. We aggregate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1405.3726  شماره 

صفحات  -

تاریخ انتشار 2014